# Real-time Voice Interaction
Minicpm O 2 6 Int4
The int4 quantized version of MiniCPM-o 2.6, significantly reducing GPU VRAM usage while supporting multimodal processing capabilities.
Text-to-Audio
Transformers Other

M
openbmb
4,249
42
Ultravox V0 2
MIT
Ultravox is a multimodal voice large language model built upon Llama3-8B-Instruct and Whisper-small, capable of processing both speech and text inputs.
Audio-to-Text
Transformers English

U
fixie-ai
792
51
Featured Recommended AI Models